December 9, 2016

A scatter plot

p <- qplot(x, y, data = bivar); p

A linear fit

p + geom_smooth(method = "lm")

A polynomial fit

p + geom_smooth(method = "lm", formula = y ~ poly(x, 5))

Another polynomial fit

p + geom_smooth(method = "lm", formula = y ~ poly(x, 20))

A spline smooth

p + geom_smooth(method = "gam", formula = y ~ s(x))

A loess smooth

p + geom_smooth(method = "loess")  ## Actually the default

Smoothing with ggplot2

The geom_smooth function easily adds misc. model fits or scatter plot smoothers to the scatter plot.

The stat_smooth function is reponsible for delegating the computations. See ?stat_smooth.

Spline smoothing is performed via the gam function in the mgcv package, whereas loess smoothing is via the loess function in the stats package.

Any "smoother" can be used that supports a formula interface and has a prediction function adhering to the standards of predict.lm.

Running mean

Implementation assuming \(y\) in correct order.

runMean <- function(y, m) {
  n <- length(y)
  k <- 2 * m + 1
  y <- y / k
  s <- rep(NA, n)
  s[m + 1] <- sum(y[1:k])
  for(i in (m + 1):(n - m - 1)) 
    s[i + 1] <- s[i] - y[i - m] + y[i + 1 + m]
  s
}

(See filter for a much faster alternative.)

An interface for geom_smooth.

rMean <- function(..., data, m = 10) {
  ord <- order(data$x)  ## Reordering if necessary
  structure(list(x = data$x[ord], y = runMean(data$y[ord], m = m)), 
            class = "rMean")
}
predict.rMean <- function(object, newdata, ...) 
  approx(object$x, object$y, newdata$x)$y ## Linear interpolation

A running mean

p + geom_smooth(method = "rMean", se = FALSE, n = 200)

Returning to the internet traffic data

traffic <- read_csv("http://nielsrhansen.github.io/Dong/BUinternet.txt")
traffic <- traffic %>% 
  select(-url) %>%
  filter(size > 0) %>% 
  mutate(speed = size / time) %>%
  sample_n(10000)  ## Subsampling data

Faceting

p <- ggplot(traffic, aes(x = size, y = speed)) + 
  geom_point() + 
  scale_x_log10() + scale_y_log10() +
  facet_wrap(~ `machine name`)

Faceting

Fewer machines

traffic <- traffic %>% 
  filter(`machine name` %in% c("animal", "beaker", "bugs", "bunsen"))
p <- ggplot(traffic, aes(x = size, y = speed)) + 
  geom_point() + 
  scale_x_log10() + scale_y_log10() +
  facet_wrap(~ `machine name`)

Adding smoothers to all plots

p + geom_smooth(se = FALSE) + 
  geom_smooth(method = "lm", se = FALSE, color = "red")

One machine

traffic <- traffic %>% 
  filter(`machine name` == "animal") 
p <- ggplot(traffic, aes(x = size, y = speed)) + 
  geom_point() + 
  scale_x_log10() + scale_y_log10() +
  geom_smooth(method = "rMean", se = FALSE, n = 200)

This time using our own smoother, and

library(plotly)

the plotly package for producing an interactive plot.

Using plotly

ggplotly(p)